Learning Head-modifier Pairs to Improve Lexicalized Dependency Parsing on a Chinese Treebank
نویسندگان
چکیده
Due to the data sparseness problem, the lexical information from a treebank for a lexicalized parser could be insufficient. This paper proposes an approach to learn head-modifier pairs from a raw corpus, and to integrate them into a lexicalized dependency parser to parse a Chinese Treebank. Experimental results show that this approach not only enlarged the coverage of bi-lexical dependency, but also improved the accuracy of dependency parsing significantly.
منابع مشابه
Cascaded Classification for High Quality Head-modifier Pair Selection
This paper presents a cascaded classification approach for selecting head-modifier pairs with high quality from syntactically analyzed sentences. Experimental results show that the proposed approach achieved 76.11% on Fscore of selected head-modifier pairs, which was 8.54% higher than the baseline approach that using sentence length as selection criteria. In addition, compared with using the he...
متن کاملChapter 1: Lexicalized PCFG: Parsing Czech
Recent work in statistical parsing of English has used lexicalized trees as a representation, and has exploited parameterizations that lead to probabilities directly associated with dependencies between pairs of words in the tree structure. Parsed corpora such as the Penn treebank have generally been sets of sentence/tree pairs: typically, hand-coded rules are used to assign head-words to each ...
متن کاملBootstrapping Lexicalized Models in Memory-Based Dependency Parsing
Previous research has shown that a lexicalized parsing model incorporating words but no parts-of-speech can outperform a model involving partsof-speech but no words given enough training data for supervised learning. We show that the same effect can be achieved with a bootstrapping approach, where a mixed model trained on a small treebank is used to parse a larger corpus which is used as traini...
متن کاملLexicalized Beam Thresholding Parsing with Prior and Boundary Estimates
We use prior and boundary estimates as the approximation of outside probability and establish our beam thresholding strategies based on these estimates. Lexical items, e.g. head word and head tag, are also incorporated to lexicalized prior and boundary estimates. Experiments on the Penn Chinese Treebank show that beam thresholding with lexicalized prior works much better than that with unlexica...
متن کاملHybrid Constituent and Dependency Parsing with Tsinghua Chinese Treebank
In this paper, we describe our hybrid parsing model on Mandarin Chinese processing. The model combines the mainstream constitute and dependency parsing and the dataset we use it the Tsinghua Chinese Treebank, whose annotation has both constitutes and head information. We show the adaption of this annotation scheme to the normal constitute structure, dependency structure, and the integration of ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007